26 research outputs found
Making Progress Based on False Discoveries
We consider the question of adaptive data analysis within the framework of
convex optimization. We ask how many samples are needed in order to compute
-accurate estimates of gradients queried by
gradient descent, and we provide two intermediate answers to this question.
First, we show that for a general analyst (not necessarily gradient descent)
samples are required. This rules out the possibility of
a foolproof mechanism. Our construction builds upon a new lower bound (that may
be of interest of its own right) for an analyst that may ask several non
adaptive questions in a batch of fixed and known rounds of adaptivity and
requires a fraction of true discoveries. We show that for such an analyst
samples are necessary.
Second, we show that, under certain assumptions on the oracle, in an
interaction with gradient descent samples are
necessary. Our assumptions are that the oracle has only \emph{first order
access} and is \emph{post-hoc generalizing}. First order access means that it
can only compute the gradients of the sampled function at points queried by the
algorithm. Our assumption of \emph{post-hoc generalization} follows from
existing lower bounds for statistical queries. More generally then, we provide
a generic reduction from the standard setting of statistical queries to the
problem of estimating gradients queried by gradient descent.
These results are in contrast with classical bounds that show that with
samples one can optimize the population risk to accuracy of
but, as it turns out, with spurious gradients
Improper Learning by Refuting
The sample complexity of learning a Boolean-valued function class is precisely characterized by its Rademacher complexity. This has little bearing, however, on the sample complexity of efficient agnostic learning.
We introduce refutation complexity, a natural computational analog of Rademacher complexity of a Boolean concept class and show that it exactly characterizes the sample complexity of efficient agnostic learning. Informally, refutation complexity of a class C is the minimum number of example-label pairs required to efficiently distinguish between the case that the labels correlate with the evaluation of some member of C (structure) and the case where the labels are i.i.d. Rademacher random variables (noise). The easy direction of this relationship was implicitly used in the recent framework for improper PAC learning lower bounds of Daniely and co-authors via connections to the hardness of refuting random constraint satisfaction problems. Our work can be seen as making the relationship between agnostic learning and refutation implicit in their work into an explicit equivalence.
In a recent, independent work, Salil Vadhan discovered a similar relationship between refutation and PAC-learning in the realizable (i.e. noiseless) case